SemanticScuttle - klotz.me » Tags: llm+machine learning

Tags: llm* + machine learning*

0 bookmark(s) - Sort by: Date ↓ / Title /

Deep-Diving & Decoding The Secrets That Make DeepSeek So Good

The article explores the architectural changes that enable DeepSeek's models to perform well with fewer resources, focusing on Multi-Head Latent Attention (MLA). It discusses the evolution of attention mechanisms, from Bahdanau to Transformer's Multi-Head Attention (MHA), and introduces Grouped-Query Attention (GQA) as a solution to MHA's memory inefficiencies. The article highlights DeepSeek's competitive performance despite lower reported training costs.

2025-02-16 Tags: deepseek, multi-head latent attention, mla, attention, transformer, grouped-query attention, gqa, deep learning, llm by klotz

The Big Book of Large Language Models

A comprehensive guide to Large Language Models by Damien Benveniste, covering various aspects from transformer architectures to deploying LLMs.

Language Models Before Transformers
Attention Is All You Need: The Original Transformer Architecture
A More Modern Approach To The Transformer Architecture
Multi-modal Large Language Models
Transformers Beyond Language Models
Non-Transformer Language Models
How LLMs Generate Text
From Words To Tokens
Training LLMs to Follow Instructions
Scaling Model Training
Fine-Tuning LLMs
Deploying LLMs

2025-02-11 Tags: llm, damien benveniste, machine learning, data science, book by klotz

Fine-Tuning of Llama-2 7B Chat for Python Code Generation: Using QLoRA, SFTrainer, and Gradient Checkpointing on the Alpaca-14k Dataset

This tutorial demonstrates how to fine-tune the Llama-2 7B Chat model for Python code generation using QLoRA, gradient checkpointing, and SFTTrainer with the Alpaca-14k dataset.

2025-02-09 Tags: llama-2, python, code generation, qlora, sftrainer, fine-tuning, llm, machine learning by klotz

Introducing Qwen2.5-VL: Advanced Vision-Language Model Capabilities

Qwen2.5-VL, the latest vision-language model from Qwen, showcases enhanced image recognition, agentic behavior, video comprehension, document parsing, and more. It outperforms previous models in various benchmarks and tasks, offering improved efficiency and performance.

2025-02-09 Tags: qwen2.5-vl, vision-language model, image recognition, document parsing, ocr, multimodal, llm, machine learning by klotz

A Complete Introduction to Using BERT Models

This article provides a comprehensive guide on the basics of BERT (Bidirectional Encoder Representations from Transformers) models. It covers the architecture, use cases, and practical implementations, helping readers understand how to leverage BERT for natural language processing tasks.

2025-02-07 Tags: bert, natural language processing, machine learning, transformers, text classification, sentiment analysis, llm by klotz

What DeepSeek Signals About Where AI Is Headed

The article discusses the implications of DeepSeek's R1 model launch, highlighting five key lessons: the shift from pattern recognition to reasoning in AI models, the changing economics of AI, the coexistence of proprietary and open-source models, innovation driven by silicon scarcity, and the ongoing advantages of proprietary models despite DeepSeek's impact.

2025-02-06 Tags: deepseek r1, hbr, llm, machine learning, reasoning by klotz

Arxiv s1: Simple test-time scaling

The article introduces a new approach to language modeling called test-time scaling, which enhances performance by utilizing additional compute resources during testing. The authors present a method involving a curated dataset and a technique called budget forcing to control compute usage, allowing models to double-check answers and improve reasoning. The approach is demonstrated with the Qwen2.5-32B-Instruct language model, showing significant improvements on competition math questions.

2025-02-14 Tags: arxiv, test-time scaling, budget forcing, llm, qwen2.5-32b-instruct, sft, fine tuning, reinforcement learning, machine learning, deepseek-r1 by klotz

GitHub s1: Simple test-time scaling

This repository provides an overview of resources for the paper 's1: Simple test-time scaling', which includes minimal recipes for test-time scaling and strong reasoning performance. It covers artifacts, structure, inference, training, evaluation, data, visuals, and citation details.

2025-02-14 Tags: test-time scaling, budget forcing, reasoning performance, github, llm, s1, machine learning, distillation by klotz

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

The article explores the DeepSeek-R1 models, focusing on how reinforcement learning (RL) is used to develop advanced reasoning capabilities in AI. It discusses the DeepSeek-R1-Zero model, which learns reasoning without supervised fine-tuning, and the DeepSeek-R1 model, which combines RL with a small amount of supervised data for improved performance. The article highlights the use of distillation to transfer reasoning patterns to smaller models and addresses challenges and future directions in RL for AI.

2025-02-06 Tags: deepseek-r1, reinforcement learning, distillation, llm, huggingface, machine learning by klotz

Hugging Face Clones OpenAI’s Deep Research in 24 Hours

Hugging Face researchers developed an open-source AI research agent called 'Open Deep Research' in 24 hours, aiming to match OpenAI's Deep Research. The project demonstrates the potential of agent frameworks to enhance AI model capabilities, achieving 55.15% accuracy on the GAIA benchmark. The initiative highlights the rapid development and collaborative nature of open-source AI projects.

2025-02-06 Tags: hugging face, openai, deep research, agent, benchmark, machine learning, llm by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: llm* + machine learning*

Linked Tags

Related Tags